Discounted approximations of undiscounted stochastic games and Markov decision processes are already poor in the almost deterministic case

نویسندگان

Endre Boros

Khaled Elbassioni

Vladimir Gurvich

Kazuhisa Makino

چکیده

It is shown that the discount factor needed to solve an undiscounted mean payoff stochastic game to optimality is exponentially close to 1, even in oneplayer games with a single random node and polynomially bounded rewards and transition probabilities. On the other hand, for the class of the so-called irreducible games with perfect information and a constant number of random nodes, we obtain a pseudo polynomial algorithm using discounts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Every stochastic game with perfect information admits a canonical form

We consider discounted and undiscounted stochastic games with perfect information in the form of a natural BWR-model with positions of three types: VB Black, VW White, VR Random. These BWR-games lie in the complexity class NP∩CoNP and contain the well-known cyclic games (when VR is empty) and Markov decision processes (when VB or VW is empty). We show that the BWR-model is polynomial-time equiv...

متن کامل

Second Order Optimality in Transient and Discounted Markov Decision Chains

Abstract. The article is devoted to second order optimality in Markov decision processes. Attention is primarily focused on the reward variance for discounted models and undiscounted transient models (i.e. where the spectral radius of the transition probability matrix is less then unity). Considering the second order optimality criteria means that in the class of policies maximizing (or minimiz...

متن کامل

Markov Decision Processes and Stochastic Games with Total Effective Payoff a

We consider finite Markov decision processes (MDPs) with undiscounted total effective payoff. We show that there exist uniformly optimal pure stationary strategies that can be computed by solving a polynomial number of linear programs. We apply this result to two-player zero-sum stochastic games with perfect information and undiscounted total effective payoff, and derive the existence of a sadd...

متن کامل

Discounting in Games across Time Scales

We introduce two-level discounted games played by two players on a perfect-information stochastic game graph. The upper level game is a discounted game and the lower level game is an undiscounted reachability game. Two-level games model hierarchical and sequential decision making under uncertainty across different time scales. We show the existence of pure memoryless optimal strategies for both...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Discounted approximations of undiscounted stochastic games and Markov decision processes are already poor in the almost deterministic case

نویسندگان

چکیده

منابع مشابه

Accelerated decomposition techniques for large discounted Markov decision processes

Every stochastic game with perfect information admits a canonical form

Second Order Optimality in Transient and Discounted Markov Decision Chains

Markov Decision Processes and Stochastic Games with Total Effective Payoff a

Discounting in Games across Time Scales

عنوان ژورنال:

اشتراک گذاری